# Large-scale Image Understanding
Vit So400m Patch14 Siglip 378.webli
Apache-2.0
A vision Transformer model based on SigLIP, containing only an image encoder, utilizing the original attention pooling mechanism.
Image Classification
Transformers

V
timm
82
0
Vit Large Patch14 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, specialized in image feature extraction
Image Classification
Transformers

V
timm
502
0
Featured Recommended AI Models